PERCEPTUAL TIME−VARYING MODELLING OF SPEECH SIGNALS FOR ASR COMPRESSION APPLICATION (MonAmOR3)
نویسندگان
چکیده
Perceptual audio coders and Automatic Speech Recognition (ASR) systems are commonly based on short−time analysis. This paper presents a generalized model for time−varying coefficients based on psychoacoustic properties of the human ear. The proposed model is evaluated in the framework of speaker independent speech recognition using Hidden Markov Models (HMM). The generalized model is compared to the traditional most popular MFCC. The comparison is made with respect to the models baud rate and the total error rate measured in an extensive Speech recognition experiment. The recognition based on the well established speech recognition development environment, the HTK and using the TIDIGIT as the evaluation database. The time varying model achieves better recognition rate in comparison to MFCC, while the proposed model baud rate is about one third of the baud rate that is used in the case of MFCC. In addition, a preliminary evaluation of the model robustness to noise was carried out and is presented.
منابع مشابه
Noise Robustness of Traditional Features for Macedonian Voice Dialing ASR
Automatic Speech Recognition Systems of today are intensely deployed in real world application scenarios which are often characterized by suboptimal operating conditions. Thus their noise robustness has become a crucial parameter when assessing ASR in-field performance. The paper examines the noise robustness of traditional ASR feature sets as applied to a Voice Dialing Application built for Ma...
متن کاملA Novel Algorithm of Sparse Representations for Speech Compression/Enhancement and Its Application in Speaker Recognition System
This paper proposes sparse and redundancy representation spectral domain compression of the speech signal using novel sparsing algorithms to the problem of speech compression (SC)/enhancement (SE). In Automatic Speaker Recognition (ASR) sparsification can play a major role to resolve big data issues in speech compression and its storage in the database, where the speech signal can be uncompress...
متن کاملProgresses in continuous speech recognition based on statistical modelling for romanian language
In this paper we will present progresses made in Automatic Speech Recognition (ASR) for Romanian language based on statistical modelling with hidden Markov models (HMMs). The progresses concern enhancement of modelling by taking into account the context in form of triphones, improvement of speaker independence by applying a gender specific training and enlargement of the feature categories used...
متن کاملContinuous-time models for AM-FM signal demodulation and their application to speech recognition
Automatic speech recognition (ASR) systems can benefit from including into their acoustic processing part new features that account for various nonlinear and time-varying phenomena during speech production. In this paper, we develop robust continuoustime expansions used to demodulate the instantaneous amplitudes and frequencies of the speech resonances and extract novel acoustic features from s...
متن کاملPlasticity in Systems for Automatic Speech Recognition: A Review
Although the topic ‘plasticity in speech perception’ is primarily concerned with the malleability of human speech perceptual behaviour, it may be illuminating to consider in parallel the degree to which current state-of-the-art ‘automatic speech recognition’ (ASR) systems also change their behaviour over time. This paper provides a review of the computational mechanisms underlying contemporary ...
متن کامل